AITopics

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.56)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)

Neural Information Processing SystemsFeb-13-2026, 05:38:33 GMT

9d8df73a3cfbf3c5b47bc9b50f214aff-AuthorFeedback.pdf

experiment, memory index, reviewer, (10 more...)

Technology: Information Technology > Artificial Intelligence (1.00)

Neural Information Processing SystemsNov-20-2025, 20:52:12 GMT

FRAGE: Frequency-Agnostic Word Representation

Chengyue Gong, Di He, Xu Tan, Tao Qin, Liwei Wang, Tie-Yan Liu

FRAGE, we achieve higher performance than the baselines in all tasks.

artificial intelligence, machine learning, natural language, (19 more...)

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > Canada > Quebec > Montreal (0.04)
(3 more...)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.75)
(2 more...)

Huber, Christian, Waibel, Alexander

Context Biasing for Pronunciations-Orthography Mismatch in Automatic Speech Recognition

arXiv.org Artificial IntelligenceOct-8-2025

Neural sequence-to-sequence systems deliver state-of-the-art performance for automatic speech recognition. When using appropriate modeling units, e.g., byte-pair encoded characters, these systems are in principal open vocabulary systems. In practice, however, they often fail to recognize words not seen during training, e.g., named entities, acronyms, or domain-specific special words. To address this problem, many context biasing methods have been proposed; however, for words with a pronunciation-orthography mismatch, these methods may still struggle. We propose a method which allows corrections of substitution errors to improve the recognition accuracy of such challenging words. Users can add corrections on the fly during inference. We show that with this method we get a relative improvement in biased word error rate of up to 8%, while maintaining a competitive overall word error rate.

artificial intelligence, machine learning, recognition, (16 more...)

2506.18703

Country: Europe > Germany (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)

Neural Information Processing SystemsOct-3-2025, 07:56:03 GMT

9d8df73a3cfbf3c5b47bc9b50f214aff-AuthorFeedback.pdf

experiment, memory index, reviewer, (10 more...)

Technology: Information Technology > Artificial Intelligence (1.00)

Kwok, Chin Yuen, yip, Jia Qi

Efficient Trie-based Biasing using K-step Prediction for Rare Word Recognition

arXiv.org Artificial IntelligenceSep-12-2025

Contextual biasing improves rare word recognition of ASR models by prioritizing the output of rare words during decoding. A common approach is Trie-based biasing, which gives "bonus scores" to partial hypothesis (e.g. "Bon") that may lead to the generation of the rare word (e.g. "Bonham"). If the full word ("Bonham") isn't ultimately recognized, the system revokes those earlier bonuses. This revocation is limited to beam search and is computationally expensive, particularly for models with large decoders. To overcome these limitations, we propose adapting ASR models to look ahead and predict multiple steps at once. This avoids the revocation step entirely by better estimating whether a partial hypothesis will lead to the generation of the full rare word. By fine-tuning Whisper with only 10 hours of synthetic data, our method reduces the word error rate on the NSC Part 2 test set from 30.86% to 12.19%.

artificial intelligence, machine learning, speech recognition, (16 more...)

doi: 10.21437/Interspeech.2025-1290

2509.09196

Country: Asia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.36)

Kwok, Chin Yuen, Yip, Jia Qi, Chng, Eng Siong

Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function

arXiv.org Artificial IntelligenceSep-12-2025

Rare word recognition can be improved by adapting ASR models to synthetic data that includes these words. Further improvements can be achieved through contextual biasing, which trains and adds a biasing module into the model architecture to prioritize rare words. While training the module on synthetic rare word data is more effective than using non-rare-word data, it can lead to overfitting due to artifacts in the synthetic audio. To address this, we enhance the TCPGen-based contextual biasing approach and propose a keyword-aware loss function that additionally focuses on biased words when training biasing modules. This loss includes a masked cross-entropy term for biased word prediction and a binary classification term for detecting biased word positions. These two terms complemen-tarily support the decoding of biased words during inference. By adapting Whisper to 10 hours of synthetic data, our method reduced the word error rate on the NSC Part 2 test set from 29.71% to 11.81%.

contextual, machine learning, natural language, (16 more...)

doi: 10.21437/Interspeech.2025-173

2509.09197

Country: Asia (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Liu, Changsong, Peng, Yizhou, Chng, Eng Siong

Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation

arXiv.org Artificial IntelligenceSep-3-2025

Contextual automatic speech recognition (ASR) systems allow for recognizing out-of-vocabulary (OOV) words, such as named entities or rare words. However, it remains challenging due to limited training data and ambiguous or inconsistent pronunciations. In this paper, we propose a synthesis-driven multi-pronunciation contextual biasing method that performs zero-shot contextual ASR on a pretrained Whisper model. Specifically, we leverage text-to-speech (TTS) systems to synthesize diverse speech samples containing each target rare word, and then use the pretrained Whisper model to extract multiple predicted pronunciation variants. These variant token sequences are compiled into a prefix-trie, which assigns rewards to beam hypotheses in a shallow-fusion manner during beam-search decoding. Subsequently, any recognized variant is mapped back to the original rare word in the final transcription. The evaluation results on the LibriSpeech dataset show that our method reduces biased-word error rate (B-WER) by 43% on test-clean and 44% on test-other while maintaining unbiased-WER (U-WER) essentially unchanged.

large language model, machine learning, natural language, (17 more...)

2508.17796

Country: Asia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

arXiv.org Artificial IntelligenceAug-27-2025

A New NMT Model for Translating Clinical Texts from English to Spanish

Li, Rumeng, Wang, Xun, Yu, Hong

Translating electronic health record (EHR) narratives from English to Spanish is a clinically important yet challenging task due to the lack of a parallel-aligned corpus and the abundant unknown words contained. To address such challenges, we propose \textbf{NOOV} (for No OOV), a new neural machine translation (NMT) system that requires little in-domain parallel-aligned corpus for training. NOOV integrates a bilingual lexicon automatically learned from parallel-aligned corpora and a phrase look-up table extracted from a large biomedical knowledge resource, to alleviate both the unknown word problem and the word-repeat challenge in NMT, enhancing better phrase generation of NMT systems. Evaluation shows that NOOV is able to generate better translation of EHR with improvement in both accuracy and fluency.

artificial intelligence, natural language, translation, (17 more...)

2508.18607

Country: North America > United States > Massachusetts (0.94)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

arXiv.org Artificial IntelligenceJun-17-2025

CMT-LLM: Contextual Multi-Talker ASR Utilizing Large Language Models

He, Jiajun, Sawada, Naoki, Miyazaki, Koichi, Toda, Tomoki

In real-world applications, automatic speech recognition (ASR) systems must handle overlapping speech from multiple speakers and recognize rare words like technical terms. Traditional methods address multi-talker ASR and contextual biasing separately, limiting performance in complex scenarios. We propose a unified framework that combines multi-talker overlapping speech recognition and contextual biasing into a single task. Our ASR method integrates pretrained speech encoders and large language models (LLMs), using optimized finetuning strategies. We also introduce a two-stage filtering algorithm to efficiently identify relevant rare words from large biasing lists and incorporate them into the LLM's prompt input, enhancing rare word recognition. Experiments show that our approach outperforms traditional contextual biasing methods, achieving a WER of 7.9% on LibriMix and 32.9% on AMI SDM when the biasing size is 1,000, demonstrating its effectiveness in complex speech scenarios.

artificial intelligence, large language model, natural language, (14 more...)

2506.12059

Country: North America (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)